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discussed before and there is no need to repeat the argument here. If the quality 
of the data which constitute the program’s input is low the gain may be fairly 
slight, but diagnoses derived by a computer from unreliable data will still be at 
least as useful as those derived from the same data by other means. But the develop¬ 
ment and progressive refinement of computer programs for generating diagnoses 
have other less immediate but equally important consequences. By creating, as 
they do, a range of experimental models simulating the process by which 
clinicians arrive at diagnoses, they focus attention on the largely unexplored 
mechanisms of diagnostic pattern recognition and decision making. By pro¬ 
ducing, as they also do, a variety of alternative diagnostic criteria, each of them 
explicit and available for public scrutiny, they serve to focus attention on the 
neglected issue of the validity of our diagnoses. Consider, for example, a situation 
in which three different programs are in use simultaneously, all embodying 
different concepts of schizophrenia. In such a situation psychiatrists would be 
forced to consider which of the three concepts was the most useful, and in so 
doing to decide what their criterion of usefulness or validity was to be. Alter- 
natively a situation might well arise in which one particular program came into 
general use, perhaps in combination with a particular interviewing schedule or 
rating scale. Such a development would have profound implications, much s 

greater than those accompanying the popularization of other instruments like 
the MM PI or the Rorschach Test. For whoever exports his program also exports 
his diagnostic criteria. We could well find that the diagnostic criteria currently 
used in one particular country, or even in a particular university department, 
might come into general use in the wake of the widespread adoption of a popular 
computer program embodying those criteria. For these and other reasons the 
application of computer technology to psychiatric diagnosis may prove to he a 
development of much greater moment than is yet apparent. 

DECISION TREE PROGRAMS 

Three quite different kinds of computer program have been used for generating 
diagnoses; those based on a logical decision tree, those based on probability 
theory and those based on multiple discriminant functions, The decision tree 
method is the easiest to understand for those with little knowledge of statistics 
for the simple reason that it does not involve any. It consists purely of a scries 
of questions each of which has to be answered yes or no. Each successive answer 
eliminates one or more diagnoses or groups of diagnoses and also determines the 
next question asked, until every diagnosis but one has been eliminated, bor 
example, the first question, based on a group of items concerned with cognitive 
functioning, might be used to determine whether the illness was organic or not. 

If the answer was ‘no’ the second question, based perhaps on a series of items 
about delusions and hallucinations, might determine whether the illness was 
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psychotic or neurotic, and so on. In this way every possible combination of 
symptoms is reduced to one or other of the diagnoses recognized in the system. 
Individual questions may specify the presence of a single item, or that a score 
derived from several items should lie within a certain range, or be based on 
complex alternatives involving numerous items in different combinations. The 
formal structure of a decision tree is actually the same as that of a railway 
marshalling yard, with patients corresponding to individual trucks, the yes/no 
questions to sets of points and the trains at the bottom of the yard to diagnoses. 
When it starts its journey at the top of the yard each truck has the potential 
to join any of the trains, but each time it passes a set of points its choice 
becomes more restricted until eventually it is committed to one particular 
train. 

Diagno 

Spitzer’s Diagno (Spitzer and Endicott, 1968) was the first program of this type 
to be developed. It is based on the thirty nine scale scores of a structured mental 
state interview known as the Psychiatric Status Schedule (see chapter 10) and 
allocates every patient to one of twenty seven diagnostic categories, including 
"not ill’ and ‘non-specific illness with mild symptomatology*. I t was soon followed 
by Diagno II (Spitzer and Endicott, 1969), a more complex program containing 
fifty seven decision points compared with Diagno*s thirty six and incorporating 
a limited capacity to revise decisions made at an earlier stage in the sequence. 
This program utilizes historical data in addition to mental stale information in 
the form of the Current and Past Psychopathology Scales (CAPPS - sec Endicott 
and Spitzer, 1972) and generates a total of forty six different diagnoses, in¬ 
cluding personality disorders. In a study based on 100 sets of CAPPS ratings, 
and using K w as an index of concordance. Spitzer and Endicott were able to 
show that the diagnostic agreement between Diagno II and clinicians was as 
good as that between one clinician and another, thus demonstrating the face 
validity as well as the reliability of the computer’s diagnoses. 

Catego 

More recently Wing and his colleagues (Wing, Cooper and Sartorius, 1974) have 
developed a similar program, known as Catego, based on their structured Present 
Stale Examination. The design of this program is rather different from that of 
l )iagno. Instead of single diagnoses or groups of diagnoses being eliminated one 
by one the original input, which consists of 350 PSE items, passes through a 
progressive series of condensations and all decisions about actual diagnoses are 
postponed until the final stage. The 350 items are first reduced to 140 ‘symptoms* 
and these in turn reduced to thirty five ‘syndromes’. Next these syndromes are 
1 ! 
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condensed to six ‘descriptive categories’. Up to this stage there is no restriction 
on the number of elements which any individual patient may exhibit but in the 
next stage, whether the patient has previously qualified for one descriptive 
category or all six, his symptomatology is reduced to a single ‘descriptive class 
Essentially the same procedure is carried out independently with other data (it 
it is available) for all previous episodes of illness and the final provisional 
diagnostic class’ is then derived from the separate descriptive class assignments 
of all episodes of illness, pasl or present. The Catego program prints out all the 
symptoms, syndromes and descriptive categories exhibited by each individual 
patient, together with a rough three point ranking (?, +, and + +) (or each, m 
addition to the final ‘provisional diagnostic class’ or diagnosis. In this way muc 
useful information is provided in a standardized form which enables unusual 
or borderline patients to be distinguished from those with typical symptom 
patterns. 

The potential of this and similar programs was well illustrated in the Inter¬ 
national Pilot Study of Schizophrenia where it was used to derive standard 
‘diagnoses’ from the PSE ratings of all 1200 patients from the nine countries 
involved (WHO, 1973a), In this way, several important similarities and differ¬ 
ences in the range of patient types encountered in the nine countries were ex¬ 
posed, and also important similarities and differences in the diagnostic criteria 
of the local psychiatrists in each centre. 

PROBABILISTIC METHODS 

The second approach is a probabilistic or statistical one based on Bayes theorem. 
The basic statement of this theorem is: 

P(dj) ■ P(sj/d;) 

LP(d*) ■ P(s,/d*) 

where 

P(d;/Sj) is the probability that a patient with the constellation of symptoms 
s ; has the disease d ; . 

P(d;) is the probability (or incidence) of the disease d ( in the population 
under consideration. 

P(s;/di) is the incidence of the symptoms s ; in the disease d,. 

P(dl) is the incidence of each disease 1 -*k in the population. 

n p( Sy /d„) is the incidence of the symptoms s ; in each of the diseases 1 ->fc. 

Most of the early programs for deriving psychiatric diagnoses by computer 

were of this type (Birnbaum and Maxwell, 1961; Overall and Gorham, 1963; 
Overall and Hollister, 1964; Smith, 1966) but, in spite of the obvious relevance of 
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probability theory to diagnosis, the Bayesian model has several disadvantage! 
It assumes that each of the symptoms 1 -*j is independent of the others and that 
the diseases 1 -*k are similarly independent of one another. In fact, neither <»l 
these assumptions is justified, though symptom independence can be artificially 
produced by replacing the actual ratings by principal components derived from 
them. The Bayesian model also requires a reasonable estimate of the incidence 
of the various symptoms 1 -*j in each of the diseases \->k in the population 
under consideration, and a similar estimate of the relative frequency of these 
diseases in that population. In practice these data are rarely available and both 
Overall and Smith were forced to resort to the questionable procedure of using 
ratings of hypothetical typical cases to provide their estimates of the distributions 
of symptoms across diseases, and also to assume that each disease was equally 
probable. 

DISCRIMINANT FUNCTIONS 

I he third group of techniques is based on the discriminant function procedures 
introduced by Fisher (1936) and Rao (1948). They are described in more detail 
in chapter 8, but in its simplest form discriminant analysis involves two popul¬ 
ations, one whose members have been assigned aclinical diagnosis A and the other 
a diagnosis B, and ail of whom have been rated for the presence or absence of N 
items relevant to the distinction between A and B. Starting w'ilh these data the 
anlaysis produces a linear variable (the discriminant function) consisting of a set of 
weights for the N items calculated so as to m ax im ize the ratio of between-group 
to within-group variance. As a result, when a score is derived for each patient 
by adding together his weighted scores on the N items, t he separation between 
those with diagnosis A and those with diagnosis B is maximal. Subsequently, 
this discriminant function can be used to allocate any patient who has been rated 
on the N items to the appropriate diagnosis, A or B. In practice several diagnoses 
are usually involved, not just two, which means using a multiple stepwise dis¬ 
criminant procedure, but the basic principle remains unchanged, Techniques of 
this sort have been used by Melrose, Stroebel and Glueck (1970) in Connecticut 
and Sletten, Ulett, Altman and Sunderland (1970) in Missouri. The latter at 
least were able to obtain a level of agreement between clinical and computer 
diagnoses comparable to that achieved by Spitzer with DiagnoII and have since 
developed this computer service as a routine procedure in all the psychiatric 
hospitals in the Missouri State system (Sletten, Altman and Ulett, 1971). Using 
data from a mental state examination and standard demographic information 
the central computer provides for each patient within minutes, or at most a few 
hours, the probabilities of that patient belonging (o each of eight broad diag¬ 
nostic groupings [acute organic brain syndrome, paranoid schizophrenia, per¬ 
sonality disorder, etc.] 
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The relative merits of the three approaches 

It is debatable which of these three approaches corresponds most closely to the 
reasoning processes employed by clinicians. Claims have been made on behalf 
of all three, each of them with some justification. Really we know too little about 
the decision-making processes of clinicians to decide, and it may well be that 
they use different strategies in different situations. The hierarchical nature of 
most of our classifications, with each diagnosis excluding the presence of those 
that precede it and encompassing the symptoms of those that follow it, strongly 
suggests a sequence of decisions akin to that of a decision tree. On the other 
hand, clinicians are clearly influenced by considerations of relative probability 
comparable to those embodied in Bayes’ Theorem. And when concerned with a 
particular differential diagnosis they may well allocate rough weights to the 
symptoms suggesting each of the two diagnoses and compare them in much the 
same way as a simple discriminant procedure does. 

It is also arguable which or the three is the most useful. The decision tree 
method is the simplest, and also the easiest to construct, but each program is 
usable only with the particular rating scale or structured interview for which it 
was designed and individual diagnostic distinctions are necessarily based on 
rather crude criteria. The two statistical procedures share the important ad¬ 
vantage that they provide not just a single diagnosis but an estimate, expressed 
as a probability by the Bayes method and as a distance by the discriminant 
function method, of how closely the patient resembles typical members of 
several different categories, thus allowing meaningful alternative diagnoses to 
be provided and distinguishing typical eases from those with unusual or border¬ 
line symptoms. However, both have disadvantages also. As we have already 
seen, Bayes’ theorem makes the unjustified assumption that both symptoms and 
diagnoses are independent of one another, and also requires prior knowledge of 
the distributions of symptoms across diseases and, to achieve its full potential, 
prior knowledge of the relative incidence of the diseases under consideration as 
well. Discriminant function procedures start with several advantages. They 
involve fewer unfulfilled assumptions than the probabilistic approach, can handle 
numeric data without having to break them up into arbitrary nominal groups 
as the other two methods do, and have tire ability to focus large amounts of 
data onto individual diagnostic discriminations to optimum effect. Their big 
disadvantage is that the linear functions they utilize have to be derived in the 
first place from ratings on large populations of patients, the size of the. requisite 
population being governed by the product of the number of separate ratings or 
scores being used and the number of diagnostic categories to be distinguished. 
In practice, this means that unless a thousand or more sets of ratings are avail¬ 
able one either has to confine oneself to distinguishing a small number of broad 
diagnostic groups, or make unjustified assumptions about variances and cor- 
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relations across diagnostic categories and so fail to achieve anything like maximal 
discrimination. A second problem common to both the statistical methods is 
that, because their ground rules are derived in the first place from clinical ratings 
and diagnoses, the short-comings of these data are incorporated into the resul¬ 
ting program. If the initial data are unreliable and biased the program’s rules 
will necessarily be in some respects inappropriate, and its discri minati ng power 
blunted as a result. This has two important consequences. The failure of a 
Bayesian or discriminant function program to generate appropriate diagnoses 
may be due to Lhe short-comings of the clincial data from which it was derived 
rather than to the inherent short-comings of the statistical method. Conversely 
an improvement in the quality of the original developmental data may be ex¬ 
pected to result in improved performance by the program. 

Fleiss and his colleagues (Fleiss, Spitzer, Cohen and Endicott, 1972) have 
recently compared the efficacy of a logical decision Irec program (Diagno II), a 
Bayesian program and a multiple discriminant function procedure at distin¬ 
guishing twelve broad diagnostic categories in a series of 740 patients rated on 
the CAPPS. Over half these patients had to be used for developing the statistical 
rules of the Bayesian and discriminant function programs and the actual com¬ 
parison was therefore restricted to the remaining 286 patients. Using K v as an 
index or the degree of concordance between the original clinical diagnoses and 
the corresponding computer categories they found little difference between the 
three approaches; K, v lay between 0-43 and 0 48 for all three. However, the 
discriminant function program was less .successful than lhe other two in repro¬ 
ducing the percentage distribution of the clinicians’ diagnoses, mainly because 
it overdiagnosed paranoid schizophrenia at the expense of non-paranoid forms. 
When a second comparison was carried out on quite different data-CAPPS 
ratings obtained from a series oi 435 women from an obstetric ward, and so with 
a much lower overall psychiatric morbidity than the previous material - Diagno 
came out best with an average iC*, for concordance with the clinicians’ diagnoses 
of 0-36, compared with 0-28 for the discriminant function program, and 0-20 for 
the Bayesian approach. The authors concluded from these results that ‘at the 
present time, a logical decision tree method such as Diagno II is preferable for 
computer diagnosis to the Bayes and discriminant function methods’. This is a 
lair assessment of the current situation, though it is likely that discriminant func¬ 
tion procedures will eventually prove superior once the practical problems of ob¬ 
taining sufficiently large series of patients for developmental purposes have been 
overcome. The appropriate choice in any given situation will also be influenced by 
other considerations peculiar to that situation- how valuable it would be to have 
alternative diagnoses available as well as a single ‘first choice’ diagnosis, whether 
a wide range of separate diagnoses are needed or only an accurate assignment 
io a few major categories, and whether or not sufficient data are available to 
pi ovide an adequate developmental sample for either of the statistical methods. 
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